Chapter 11: Frequency Analysis
Welcome to the online content for Chapter 11!
As always, I’ll assume that you’ve already read up to this chapter of the book and worked through the online content for the previous chapters. If not, please do that first.
As always, click the ‘Run Code’ buttons below to execute the R code. Remember to wait until they say ‘Run Code’ before you press them. And be careful to run these boxes in order if later boxes depend on you having done other things previously.
Chi-squared test for independence
Let’s begin by reading in the data set that I used in the chapter.
This time, we use read.table
, rather than read.csv
, because we want R to know that the data is a 2-way table, in which the contents of the first column are the row headings, and not the levels of another variable!
To run a chi-squared test for independence, we use the chisq.test
function in R:
We get the values discussed in the chapter.
Fisher’s exact test
To run Fisher’s exact test, we just use the fisher.test
function:
We get the \(p\) value of .009 that I mentioned in the chapter.
Chi-squared goodness-of-fit test
As there are just a few numbers needed, let’s not bother reading in a data file. Instead, we’ll just create the variables ourselves, by typing in the numbers. For example, to put the numbers 12, 5, 23, 3, 10 and 7 into a variable, we use the c(…)
function in R, where c
stands for ‘concatenate’ (bring together).
Now, if we enter the variable name ‘observed’, we get the list of numbers that we put in:
We can do the same for the expected values that we want to compare our observed values with:
There are slightly quicker ways of doing this. Instead of typing ‘10’ six times, we could use the rep
function (short for ‘replicate’):
Now, we can test the the null hypothesis that the population from which ‘observed’ was a random sample is similar to ‘expected’:
Here, we have to put p=expected/60
to say that the expected variable out of 60 tells us the probabilities or percentages that we want to compare our observed values with. (We use 60 because the total of the six expected values is 60, as is the total of the six observed values.)
We obtain the values discussed in the chapter.
Type or paste into the code box below to compare ‘observed’ with the 10, 10, 20, 5, 5, 10 set of expected frequencies discussed in the chapter.
Check that you obtain the same results that I mentioned in the chapter.